-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for selecting/excluding group slices #5198
base: develop
Are you sure you want to change the base?
Conversation
WalkthroughThis pull request introduces significant updates to the FiftyOne framework's grouped datasets functionality. The changes encompass various files, focusing on enhancing group slice management, updating documentation, and refining dataset view handling. Key modifications include the addition of methods to exclude group slices, enhancements to the selection of group slices, and a comprehensive update to the documentation that clarifies the creation, manipulation, and usage of grouped datasets. Changes
Possibly related PRs
Suggested Labels
Suggested Reviewers
Poem
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
24359cd
to
73a775b
Compare
73a775b
to
9b8c693
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
fiftyone/core/stages.py (1)
5037-5040
: Consider simplifying the if/else
Use a ternary operator for concise readability.Proposed change:
-if etau.is_str(self._slices): - slices = {self._slices} -else: - slices = set(self._slices) +slices = {self._slices} if etau.is_str(self._slices) else set(self._slices)🧰 Tools
🪛 Ruff (0.8.2)
5037-5040: Use ternary operator
slices = {self._slices} if etau.is_str(self._slices) else set(self._slices)
instead ofif
-else
-blockReplace
if
-else
-block withslices = {self._slices} if etau.is_str(self._slices) else set(self._slices)
(SIM108)
fiftyone/__public__.py (1)
201-201
: Remove or utilize the unused import "ExcludeGroupSlices"
Static analysis indicates that "ExcludeGroupSlices" is imported on line 201 but never used. If this import is genuinely unnecessary, you can safely remove it to keep the codebase clean. Otherwise, consider referencing the symbol to avoid the unused-import warning.🧰 Tools
🪛 Ruff (0.8.2)
201-201:
.core.stages.ExcludeGroupSlices
imported but unusedRemove unused import
(F401)
tests/unittests/group_tests.py (1)
645-764
: Consider extracting common assertion patterns into helper methods.While the test is comprehensive and well-structured, there are repeated assertion patterns that could be extracted into helper methods to improve maintainability and reduce code duplication. For example:
+ def _assert_group_view_state(self, view, expected_len, expected_slices, expected_media_types): + self.assertEqual(len(view), expected_len) + self.assertEqual(view.media_type, "group") + self.assertSetEqual(set(view.group_slices), set(expected_slices)) + self.assertDictEqual(view.group_media_types, expected_media_types) + self.assertIn(view.group_slice, expected_slices) + self.assertIn(view.default_group_slice, expected_slices) def test_select_exclude_slices(self): dataset = _make_group_dataset() # Select slices by name view = dataset.select_group_slices(["left", "right"], flat=False) - self.assertEqual(len(view), 2) - self.assertEqual(view.media_type, "group") - self.assertSetEqual(set(view.group_slices), {"left", "right"}) - self.assertDictEqual( - view.group_media_types, {"left": "image", "right": "image"} - ) - self.assertIn(view.group_slice, ["left", "right"]) - self.assertIn(view.default_group_slice, ["left", "right"]) + self._assert_group_view_state( + view, + expected_len=2, + expected_slices=["left", "right"], + expected_media_types={"left": "image", "right": "image"} + )This would make the test more maintainable and easier to update if the assertions need to change.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
docs/source/user_guide/groups.rst
(3 hunks)fiftyone/__public__.py
(1 hunks)fiftyone/core/collections.py
(6 hunks)fiftyone/core/dataset.py
(2 hunks)fiftyone/core/stages.py
(16 hunks)fiftyone/core/view.py
(13 hunks)tests/unittests/group_tests.py
(2 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
fiftyone/__public__.py
201-201: .core.stages.ExcludeGroupSlices
imported but unused
Remove unused import
(F401)
fiftyone/core/stages.py
5037-5040: Use ternary operator slices = {self._slices} if etau.is_str(self._slices) else set(self._slices)
instead of if
-else
-block
Replace if
-else
-block with slices = {self._slices} if etau.is_str(self._slices) else set(self._slices)
(SIM108)
🔇 Additional comments (41)
fiftyone/core/view.py (15)
165-165
: Looks good!
This logic correctly returns False when the stage flattens groups.
247-247
: No issues found.
This fallback expression is appropriate.
274-274
: Straightforward return statement
This line upholds clarity by returning the group media types’ keys.
284-284
: Minor note
The code cleanly returns the group media types without issues.
294-296
: No issues found
Returning the slice from the grouped dataset if the default slice doesn’t exist is logical.
1563-1563
: No concerns
Context variable "_found_flattened_videos" is introduced without any apparent problems.
1579-1584
: Looks correct
The code checks whether groups are flattened, ensuring correct frame reattachment logic.
1612-1612
: Comment block only
No review comment needed.
1617-1617
: Comment only
No direct code to review.
1651-1652
: Logic is appropriate
Manually attaching frames after group lookup is valid.
1704-1704
: No issues
Correctly setting the group slice.
1817-1825
: Code reads well
This handles updated group media types and adjusts the slice if needed.
1833-1835
: Simple setter
No problems with the group slice assignment.
1939-1949
: All good
The function iterates over stages, updating group media types. No issues.
1976-1976
: Trivial single-line snippet
No further remarks.
fiftyone/core/stages.py (8)
101-112
: Clear docstring
Property explains when the stage does or does not flatten groups.
225-240
: Method is well-defined
Provides group media types or returns None. Looks consistent.
Line range hint 4581-4654
: Documentation only
Explains SelectGroupSlices with examples. No code issues found.
4666-4668
: Docstring lines
No issues with these doc comments.
4932-5082
: Implementation looks coherent
This new class ExcludeGroupSlices is consistent with the codebase.
🧰 Tools
🪛 Ruff (0.8.2)
5037-5040: Use ternary operator slices = {self._slices} if etau.is_str(self._slices) else set(self._slices)
instead of if
-else
-block
Replace if
-else
-block with slices = {self._slices} if etau.is_str(self._slices) else set(self._slices)
(SIM108)
8847-8847
: No actionable code
Just a list containing ExcludeGroupSlices.
8895-8895
: No actionable code
Simple registry addition for ExcludeGroupSlices.
8912-8912
: No actionable code
Again, references the new class in the registry.
fiftyone/core/dataset.py (1)
8494-8496
: Enforce restriction on dynamic grouped views
This check effectively prevents cloning datasets that rely on dynamic grouping logic. By raising an exception here, the code avoids deeper inconsistencies in the pipeline. The implementation is straightforward and properly guards against a feature that isn't supported.
docs/source/user_guide/groups.rst (14)
818-819
: Clarify the usage of flat vs. grouped views
These lines efficiently explain how users can select individual slices from their grouped datasets and flatten them into ungrouped collections. The documentation is concise and appears accurate.
846-846
: Highlighting multi-slice selection
This line clarifies that you can select multiple slices (e.g., "left" and "right"). It is clear and helpful.
885-891
: Emphasize that prior filters don't affect unseen slices
These lines accurately warn that filtering a single slice doesn't automatically carry over to other slices. This is a crucial note for new users.
892-901
: Encourage filtering after flattening
This explanation neatly instructs users to apply filters post-flattening if their goal is to filter slices collectively. It's well-structured.
902-909
: Good demonstration of layering multiple filters
The code snippet clarifies how to refine the flattened collection further with additional filters. No issues noted.
910-911
: Reiterating usage of select_group_slices with flat=False
This reference is consistent with earlier docs. Clear and consistent.
912-912
: Small segue to next section
This line is functioning as a transition. No issues.
917-917
: Loading a grouped view while excluding slices
Providing an example that merges well with the new features. Good clarity.
919-920
: Maintaining grouped media type
Line references demonstrate the dataset remains “group” after excluding a slice. Looks accurate.
922-923
: New subheading for excluding slices
The heading is appropriately placed and clearly states the feature’s purpose.
924-928
: Intro to exclude_group_slices
The explanation is direct and succinct. Perfect for users who need to exclude certain slices.
929-929
: Transition line
No issues found with this bridging text.
934-935
: Excluding a single slice
Well-structured code snippet showcasing removing "center." The example is consistent with the new method.
936-937
: Validation of new slices post exclusion
The pair of assertions is straightforward, verifying the reduced set of slices.
tests/unittests/group_tests.py (1)
591-602
: LGTM! Well-structured schema validation test.
The test effectively validates that schema changes are maintained when selecting active slices, with thorough checks for field removal from both sample and frame schemas.
fiftyone/core/collections.py (2)
4621-4694
: LGTM! Well-implemented group slice exclusion functionality
The new exclude_group_slices()
method provides a clean and intuitive way to exclude specific group slices or media types from a collection. The implementation is robust and the documentation includes clear examples.
Line range hint 6671-6773
: LGTM! Good enhancement to select_group_slices with flat parameter
The addition of the flat
parameter to select_group_slices()
provides more flexibility by allowing users to choose between flattened and grouped views. The implementation maintains backward compatibility by defaulting to flat=True
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
fiftyone/core/dataset.py (1)
8513-8515
: Ensure consistent group field data when cloningCopying the group field and its associated metadata from the source collection maintains a coherent group context in the cloned dataset. However, consider verifying that these fields do not conflict with any existing group slices. Additionally, unit tests that cover cloning both grouped and non-grouped datasets can help guard against regressions.
fiftyone/core/stages.py (1)
4969-5119
: LGTM! Well-implemented new ExcludeGroupSlices class.The new
ExcludeGroupSlices
class is well-designed with:
- Clear documentation and examples
- Proper parameter validation
- Consistent implementation with other group-related stages
However, consider using a ternary operator for better readability:
- if etau.is_str(self._slices): - slices = {self._slices} - else: - slices = set(self._slices) + slices = {self._slices} if etau.is_str(self._slices) else set(self._slices)🧰 Tools
🪛 Ruff (0.8.2)
5074-5077: Use ternary operator
slices = {self._slices} if etau.is_str(self._slices) else set(self._slices)
instead ofif
-else
-blockReplace
if
-else
-block withslices = {self._slices} if etau.is_str(self._slices) else set(self._slices)
(SIM108)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
fiftyone/__public__.py
(1 hunks)fiftyone/core/collections.py
(6 hunks)fiftyone/core/dataset.py
(2 hunks)fiftyone/core/stages.py
(16 hunks)fiftyone/core/view.py
(13 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- fiftyone/core/view.py
🧰 Additional context used
🪛 Ruff (0.8.2)
fiftyone/core/stages.py
5074-5077: Use ternary operator slices = {self._slices} if etau.is_str(self._slices) else set(self._slices)
instead of if
-else
-block
Replace if
-else
-block with slices = {self._slices} if etau.is_str(self._slices) else set(self._slices)
(SIM108)
fiftyone/__public__.py
201-201: .core.stages.ExcludeGroupSlices
imported but unused
Remove unused import
(F401)
⏰ Context from checks skipped due to timeout of 90000ms (6)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
- GitHub Check: build / build
- GitHub Check: e2e / test-e2e
- GitHub Check: build
🔇 Additional comments (7)
fiftyone/core/dataset.py (1)
8477-8479
: Good validation to guard against unintended usageRaising a ValueError here is a clear way to prevent unsupported operations on dynamic grouped views. This ensures that the code doesn’t attempt untested or risky clones of such views.
fiftyone/__public__.py (1)
201-201
: LGTM! The import addition looks good.The addition of
ExcludeGroupSlices
to the public interface aligns with the PR objectives of adding support for excluding group slices. The static analysis warning about unused import can be safely ignored as this is a public interface module that intentionally re-exports symbols to make them available in the public API.🧰 Tools
🪛 Ruff (0.8.2)
201-201:
.core.stages.ExcludeGroupSlices
imported but unusedRemove unused import
(F401)
fiftyone/core/stages.py (3)
101-112
: LGTM! Well-documented property addition.The new
flattens_groups
property is well-designed with clear documentation of return values and their meanings.
Line range hint
8884-8932
: LGTM! Proper registration of new stages.The new
ExcludeGroupSlices
stage is correctly registered in both_STAGES
and_STAGES_THAT_SELECT_OR_REORDER
lists.Also applies to: 8949-8949
Line range hint
4618-4955
: Verify the impact of the newflat
parameter on existing code.The addition of the
flat
parameter toSelectGroupSlices
changes its behavior. While the implementation looks correct, we should ensure this change doesn't break existing code that may rely on the previous behavior.✅ Verification successful
✓ The
flat
parameter addition is safe and backward compatibleThe new
flat
parameter maintains backward compatibility by defaulting toTrue
, which preserves the original behavior. All existing usages in the codebase rely on this default behavior, and the change only affects cases whereflat=False
is explicitly specified.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for existing usages of SelectGroupSlices to verify impact rg -A 2 "SelectGroupSlices\("Length of output: 2832
fiftyone/core/collections.py (2)
4621-4694
: LGTM! Well-designed method for excluding group slicesThe implementation provides a clean and intuitive way to exclude group slices by slice name or media type. The documentation and examples are thorough and helpful.
Line range hint
6686-6784
: LGTM! Well-designed method for selecting group slicesThe implementation provides flexible functionality for selecting group slices with the option to return a flattened or grouped view. The documentation and examples clearly demonstrate the usage patterns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/unittests/server_group_tests.py (2)
149-154
: Add verification of the remaining slices.
Excluding the “three” slice and asserting the resulting length is 2 is correct. However, consider also verifying that the slice names in the resulting dataset match the expected “one” and “two.”
156-161
: Check preserved media type.
Selecting slices “one” and “two” withflat=False
is well tested. Consider also asserting that the resulting view retainsgroup
as its media type for completeness.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
fiftyone/server/view.py
(2 hunks)tests/unittests/server_group_tests.py
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- fiftyone/server/view.py
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
- GitHub Check: e2e / test-e2e
- GitHub Check: test / test-app
- GitHub Check: build / build
- GitHub Check: build / changes
- GitHub Check: lint / eslint
- GitHub Check: build
🔇 Additional comments (3)
tests/unittests/server_group_tests.py (3)
15-15
: Imports look good.
No conflicts or unused imports identified.
126-132
: Consider validating existing slices.
When adding group slices, consider how duplicate or pre-existing slice names are handled. It may be beneficial to confirm or raise an error if a slice name already exists.
134-147
: Sample creation logic is correct.
Defining separate group elements for each sample ensures the data is properly segmented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. New server tests were added in server_dataset_tests.py
and server_group_tests.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
fiftyone/server/query.py (1)
597-599
: Consider narrowing the exception handling
This catch-allexcept:
block may hide potentially unrelated issues. Narrowing the exception type and re-raising it would provide better debugging and prevent masking other errors.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
fiftyone/core/view.py
(14 hunks)fiftyone/server/query.py
(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- fiftyone/core/view.py
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.11)
- GitHub Check: test / test-app
- GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.9)
- GitHub Check: build / build
- GitHub Check: lint / eslint
- GitHub Check: e2e / test-e2e
- GitHub Check: build
🔇 Additional comments (2)
fiftyone/server/query.py (2)
323-323
: Confirm intentional discarding of existing group media type data
By settingdoc["group_media_types"] = []
, any preexisting data for group media types indoc
is overridden. Verify that relying solely on the view’s definitions is indeed desired.
608-613
: Logic confirmed for assigning group media types
This sequence properly populatesdata.group_media_types
fromview._get_group_media_types()
. The approach cleanly aligns with the new workflow for retrieving group-related metadata from views instead of relying on the loaded doc data.
Change log
select_group_slices(..., flat=False)
syntax for selecting group slices w/o flatteningexclude_group_slices()
method for excluding group slices (also w/o flattening, as this is the only behavior that makes sense when excluding group slices in general)TODO
view.group_slices
/view.group_media_types
rather thandataset.group_slices
/dataset.group_media_types
, as the former may no longer always match the latterExample usage
Summary by CodeRabbit
New Features
ExcludeGroupSlices
functionality for filtering group slices.exclude_group_slices
method to theSampleCollection
class for excluding specified group slices.ViewStage
class to control group flattening and media type retrieval.Dataset
class.Bug Fixes
Dataset
class.Tests